-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-8383. Misreplication cannot be resolved with single rack #4539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Going through this code in I'm wondering if |
|
Thanks @siddhantsangwan for taking a look.
Good point, it might be a more complete fix, and |
…ad of SCMCommonPlacementPolicy#validateContainerPlacement
|
@swamirishi @ashishkumar50 can you please take a look? |
siddhantsangwan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good other than a minor comment.
| stat = new ContainerPlacementStatusDefault(1, 4, 3, 1, Arrays.asList(1, 2)); | ||
| stat = new ContainerPlacementStatusDefault(1, 4, 1, Arrays.asList(1, 2)); | ||
| assertFalse(stat.isPolicySatisfied()); | ||
| assertEquals(2, stat.misReplicationCount()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code here doesn't make much sense to me because it's saying currentRacks is 1 but the last argument says 1 replica is on 1 rack and 2 replicas on another rack.
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR changes a bunch of interfaces changing their definition of the class, would be better to discuss this.
| List<? extends Node> sortByDistanceCost(Node reader, | ||
| List<? extends Node> nodes, int activeLen); | ||
|
|
||
| default int getRackCount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have the concept of Racks in Network Topology? Should this particular function go in PlacementPolicy class instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sodonnel I remember us discussing this before.
| } | ||
| int maxLevel = networkTopology.getMaxLevel(); | ||
| int numRacks = networkTopology.getNumOfNodes(maxLevel - 1); | ||
| int numRacks = networkTopology.getRackCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can move this particular logic into SCM common placement policy instead of relying on NetworkTopology since NetworkTopology class is meant to be more generic & need not understand racks.
| protected int getRequiredRackCount(int numReplicas) { | ||
| return REQUIRED_RACKS; | ||
| int racks = networkTopology != null ? networkTopology.getRackCount() : 1; | ||
| return Math.min(REQUIRED_RACKS, racks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, this is the only change which fixes the particular issue. I think it would be better to create another refactoring jira, if you want to change the other interfaces.
sodonnel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@swamirishi would you like to take another look? |
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks @siddhantsangwan, @sodonnel, @swamirishi for the review. |
What changes were proposed in this pull request?
The default topology contains a single rack. With rack-aware container placement policy (HDDS-8300), overreplication is considered a misreplication, since more replicas are in a single rack than desired. Yet misreplication cannot be resolved since there is no other rack.
This can be reproduced by configuring rack-awareness for integration tests:
and running:
Container placement should be considered valid (not misreplicated) if there is only a single rack. This will let the overreplication logic take care of the extra replicas instead of the misreplication one.
https://issues.apache.org/jira/browse/HDDS-8383
How was this patch tested?
The same integration test passed. Also ran unit tests related to topology and placement policy.
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4617349558